Yeast expression vectors for production of ITF

ABSTRACT

The invention features ITF expression vectors and methods of producing ITF.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 60/702,583, filed Jul. 25, 2005, which is hereby incorporated by reference.

BACKGROUND OF THE INVENTION

Intestinal trefoil factor (“ITF”) expression vectors and methods of overexpressing ITF have been described (see, for example, Thim et al., Biochemistry, 34:4757-4764, 1995, and Kanai et al., Proc. Natl. Acad. Sci. USA, 95:178-182, 1998). One useful expression system involves the use of the yeast Pichia pastoris, which allows overexpression and secretion of heterologous genes at high levels (see, for example, Tschopp et al., Bio/Technology, 5:1305-1308, 1987, and Cregg et al., Bio/Technology, 11:905-910, 1993).

A need exists in the art for new methods of producing ITF in quantity, especially without extraneous amino acid sequences (FIGS. 1-2). The present invention provides such vectors and methods.

SUMMARY OF THE INVENTION

In one aspect, the invention features an expression vector including a nucleic acid encoding a biologically active intestinal trefoil factor (ITF) polypeptide. This expression vector encodes a polypeptide that includes an N-terminal fusion sequence, a protease cleavage site, and the ITF polypeptide, such that the cleavage site is contiguous with the ITF polypeptide and is recognized by a protease that cleaves immediately C-terminal to the cleavage site. In particular embodiments, the N-terminal fusion sequence includes an MFα signal sequence or an MFα presequence.

In particular embodiments, the cleavage site is recognized by the protease KEX2. Desirably, the cleavage site includes a nucleic acid having the sequence of SEQ ID NO: 401.

In particular embodiments, the cleavage site is recognized by yeast aspartic protease (Yap3), Type IV dipeptidyl aminopeptidase (DPAP), yeast glycosyl-phosphatidylinositol-linked aspartyl protease (Mkc 7), pepsin, trypsin, chymotrypsin, or subtilisin.

In particular embodiments, the ITF polypeptide used in the vector of the invention is hITF₁₅₋₇₃. Desirably, the vector includes a nucleic acid having the sequence of SEQ ID NO: 111. Desirably, the vector includes a nucleic acid having the sequence of SEQ ID NO: 3.

In particular embodiments, the ITF polypeptide used in the vector of the invention is hITF₁₋₇₃. Desirably, the vector includes a nucleic acid having the sequence of SEQ ID NO: 116. Desirably, the vector includes a nucleic acid having the sequence of SEQ ID NO: 4.

In particular embodiments, the ITF polypeptide used in the vector of the invention is hITF₂₁₋₆₂, hITF₂₁₋₇₀, hITF₂₁₋₇₂, hITF₂₁₋₇₃, hITF₂₂₋₆₂, hITF₂₂₋₇₀, hITF₂₂₋₇₂, hITF₂₂₋₇₃, hITF₂₅₋₆₂, hITF₂₅₋₇₀, hITF₂₅₋₇₂, or hITF₂₅₋₇₃.

In particular embodiments, the ITF polypeptide used in the vector of the invention is pITF₂₂₋₈₀ or pITF₁₋₈₀. Desirably, the vector includes a nucleic acid having the sequence of SEQ ID NO: 112 or SEQ ID NO: 117.

In particular embodiments, the ITF polypeptide used in the vector of the invention is dITF₂₂₋₈₀ or dITF₁₋₈₀. Desirably, the vector includes a nucleic acid having the sequence of SEQ ID NO: 113 or SEQ ID NO: 118.

In particular embodiments, the ITF polypeptide used in the vector of the invention is rITF₂₃₋₈₁ or rITF₁₋₈₁. Desirably, the vector includes a nucleic acid having the sequence of SEQ ID NO: 114 or SEQ ID NO: 119.

In particular embodiments, the ITF polypeptide used in the vector of the invention is mITF₂₃₋₈₁ or mITF₁₋₈₁. Desirably, the vector includes a nucleic acid having the sequence of SEQ ID NO: 115 or SEQ ID NO: 120.

In a second aspect, the invention features an expression vector that includes a nucleic acid having at least 90% sequence identity to nucleotides 1 to 1,191 of SEQ ID NO: 2, a nucleic acid encoding a biologically active ITF polypeptide, and a nucleic acid having at least 90% sequence identity to nucleotides 1,218 to 8,001 of SEQ ID NO: 2.

In a third aspect, the invention features an expression vector that includes a nucleic acid having at least 90% sequence identity to nucleotides 1 to 1,008 of SEQ ID NO: 5, a nucleic acid encoding a biologically active ITF polypeptide, and a nucleic acid having at least 90% sequence identity to nucleotides 1,035 to 7,818 of SEQ ID NO: 5.

In a fourth aspect, the invention features a cell transformed with the vector of any of the previous aspects.

In a fifth aspect, the invention features a composition that includes a cell transformed with the vector of any of the previous aspects and a cell culture medium. In particular embodiments of the above aspect, the cell is Pichia pastoris. Desirably, the cell is a (Mut+) GS115 strain or a (his4-) GS115 strain.

In a sixth aspect, the invention features a method of culturing a cell of the fourth aspect so as to express the encoded ITF polypeptide and recover this polypeptide from the culture medium. In particular embodiments of the above aspect, the ITF polypeptide is secreted from the cell. Desirably, the expressed polypeptide is proteolytically processed in vivo prior to secretion from said cell, resulting in secretion of the ITF polypeptide substantially free of extraneous residues. Alternatively, the secreted polypeptide is contacted with a purified proteolytic enzyme in a reaction chamber, thereby producing the ITF polypeptide substantially free of extraneous residues.

In a seventh aspect, the invention features a method of culturing a cell of the fourth aspect so as to express the encoded ITF polypeptide and recover this polypeptide from the culture medium. In particular embodiments of the above aspect, the ITF polypeptide is secreted from the cell. Desirably, the expressed polypeptide is proteolytically processed in vivo prior to secretion from said cell. Alternatively, the secreted polypeptide is contacted with a purified proteolytic enzyme in a reaction chamber.

In an eighth aspect, the invention features a method for producing a biologically active ITF polypeptide. This method includes the step of culturing yeast transformants containing recombinant plasmids encoding a polypeptide that includes a ITF polypeptide, such that the yeast produce and secrete the ITF polypeptide unaccompanied by an extraneous EA amino acid sequence. The method also includes steps of isolating and purifying the ITF polypeptide. In particular embodiments, the ITF polypeptide is hITF₁₅₋₇₃, hITF₁₋₇₃, hITF₂₁₋₆₂, hITF₂₁₋₇₀, hITF₂₁₋₇₂, hITF₂₁₋₇₃, hITF₂₂₋₆₂, hITF₂₂₋₇₀ hITF₂₂₋₇₂) hITF₂₂₋₇₃, hITF₂₅₋₆₂, hITF₂₅₋₇₀, hITF₂₅₋₇₂, hITF₂₅₋₇₃ pITF₁₋₈₀, pITF₂₂₋₈₀, dITF₁₋₈₀, dITF₂₂₋₈₀, rITF₁₋₈₁, rITF₂₃₋₈₁, mITF₁₋₈₁, or mITF₂₂₋₈₁.

In a ninth aspect, the invention features a polypeptide that includes an N-terminal fusion sequence, a protease cleavage site, and the ITF polypeptide, such that the cleavage site is contiguous with the ITF polypeptide and is recognized by a protease that cleaves immediately C-terminal to the cleavage site. In particular embodiments, the sequence of the polypeptide comprises SEQ ID NO: 306, SEQ ID NO: 307, SEQ ID NO: 308, SEQ ID NO: 309, SEQ ID NO: 310, SEQ ID NO: 311, SEQ ID NO: 312, SEQ ID NO: 313, SEQ ID NO: 314, SEQ ID NO: 315, SEQ ID NO: 316, SEQ ID NO: 317, SEQ ID NO: 318, SEQ ID NO: 319, or SEQ ID NO: 320.

In a tenth aspect, the invention features an expression vector that includes a nucleic acid having the sequence of SEQ ID NO: 2.

In an eleventh aspect, the invention features an expression vector that includes a nucleic acid having the sequence of SEQ ID NO: 5.

By “intestinal trefoil factor” (“ITF”) is meant any protein that is substantially homologous to human ITF (FIG. 14A) and that is expressed in the large intestine, small intestine, or colon to a greater extent than it is expressed in tissues other than the small intestine, large intestine, or colon. Also included are: allelic variations; natural mutants; induced mutants; proteins encoded by DNA that hybridizes under high or low stringency conditions to ITF encoding nucleic acids retrieved from naturally occurring material; and polypeptides or proteins retrieved by antisera to ITF, especially by antisera to the active site or binding domain of ITF. The term also includes other chimeric polypeptides that include an ITF.

In addition to substantially full-length polypeptides, the term ITF, as used herein, includes biologically active fragments of the polypeptides. As used herein, the term “fragment,” which applies to a polypeptide unless otherwise indicated, will ordinarily be at least 10 contiguous amino acids, typically at least 20 contiguous amino acids, more typically at least 30 contiguous amino acids, usually at least 40 contiguous amino acids, preferably at least 50 contiguous amino acids, and most preferably 59 or more contiguous amino acids in length. Fragments of ITF can be generated by methods known to those skilled in the art and described herein. The ability of a candidate fragment to exhibit a biological activity of ITF can be assessed by methods known to those skilled in the art and are described herein. Also included in the term “fragment” are biologically active ITF polypeptides containing amino acids that are normally removed during protein processing, including additional amino acids that are not required for the biological activity of the polypeptide, or including additional amino acids that result from alternative mRNA splicing or alternative protein processing events.

An ITF polypeptide, fragment, or analog is biologically active if it exhibits a biological activity of a naturally occurring ITF, e.g., the ability to alter gastrointestinal motility in a mammal, the ability to restitute gastrointestinal, respiratory, or uterine epithelium, or the ability to enhance dermal or corneal epithelial wound healing.

Particularly useful ITF polypeptides that retain biological activity include the polypeptides corresponding to amino acid residues 1-73 of SEQ ID NO: 301 (full-length human ITF, also designated hITF₁₋₇₃), amino acid residues 15-73 of SEQ ID NO: 301 (hITF₁₅₋₇₃), amino acid residues 21-62 of SEQ ID NO: 301 (hITF₂₁₋₆₂), amino acid residues 21-70 of SEQ ID NO: 301 (hITF₂₁₋₇₀), amino acid residues 21-72 of SEQ ID NO: 301 (hITF₂₁₋₇₂), amino acid residues 21-73 of SEQ ID NO: 301 (hITF₂₁₋₇₃), amino acid residues 22-62 of SEQ ID NO: 301 (hITF₂₂₋₆₂), amino acid residues 22-70 of SEQ ID NO: 301 (hITF₂₂₋₇₀), amino acid residues 22-72 of SEQ ID NO: 301 (hITF₂₂₋₇₂), amino acid residues 22-73 of SEQ ID NO: 301 (hITF₂₂₋₇₃), amino acid residues 25-62 of SEQ ID NO: 301 (hITF₂₅₋₆₂), amino acid residues 25-70 of SEQ ID NO: 301 (hITF₂₅₋₇₀), amino acid residues 25-72 of SEQ ID NO: 301 (hITF₂₅₋₇₂), amino acid residues 25-73 of SEQ ID NO: 301 (hITF₂₅₋₇₃), amino acid residues 1-80 of SEQ ID NO: 302 (full-length pig ITF, also designated pITF₁₋₈₀), amino acid residues 22-80 of SEQ ID NO: 302 (pITF₂₂₋₈₀), amino acid residues 1-80 of SEQ ID NO: 303 (full-length dog ITF, also designated dITF₁₋₈₀), amino acid residues 22-80 of SEQ ID NO: 303 (dITF₂₂₋₈₀), amino acid residues 1-81 of SEQ ID NO: 304 (full-length rat ITF, also designated rITF₁₋₈₁), amino acid residues 23-81 of SEQ ID NO: 304 (rITF₂₃₋₈₁), amino acid residues 1-81 of SEQ ID NO: 305 (full-length mouse ITF, also designated mITF₁₋₈₁), and amino acid residues 23-81 of SEQ ID NO: 305 (mITF₂₃₋₈₁).

By “MFα prepropeptide sequence” or “MFα signal sequence” is meant the DNA sequence spanned by nucleotides 949 to 1,191 of SEQ ID NO: 2 or the protein sequence encoded therein.

By “MFα presequence” is meant the DNA sequence spanned by nucleotides 949 to 1,008 of SEQ ID NO: 5 or the protein sequence encoded therein.

By “modify,” when applied to an expression vector, is meant to alter the sequence of such a vector by addition, deletion, or mutation of nucleotides. A modified vector will generally exhibit at least 70%, more preferably 80%, more preferably 90%, and most preferably 95% or even 99% sequence identity with the unmodified vector.

By “N-terminal fusion sequence” is meant an amino acid sequence fused to the N terminus of a protein of interest, or a nucleotide sequence encoding such an amino acid sequence. This could include the MFα signal sequence, the MFα presequence, or other signal or leader sequences. The term also encompasses fusion partners that enhance size, solubility, or other desirable characteristics of the expressed fusion protein.

By “polypeptide” or “peptide” or “protein” is meant any chain of at least two naturally-occurring amino acids, or unnatural amino acids (e.g., those amino acids that do not occur in nature) regardless of post-translational modification (e.g., glycosylation or phosphorylation), constituting all or part of a naturally-occurring or unnatural polypeptide or peptide, as is described herein.

Polypeptides or derivatives thereof may be fused or attached to another protein or peptide, for example, as an α-Factor signal sequence fusion polypeptide.

Sequence identity is typically measured using sequence analysis software with the default parameters specified therein (e.g., Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705). This software program matches similar sequences by assigning degrees of homology to various substitutions, deletions, and other modifications. Conservative substitutions typically include substitutions within the following groups: glycine, alanine, valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine.

By “protease” or “proteolytic enzyme” is meant an enzyme that catalyzes the cleavage, or proteolysis, of proteins into smaller peptide fractions and amino acids. Some proteases recognize and bind to particular amino acid sequences and cleave specific peptide bonds within or outside the recognition sequence, while other proteases cleave nonspecifically.

By “protease cleavage site” or “protease recognition site” is meant an amino acid sequence to which a protease is capable of binding, thereby leading to proteolysis. Desirably, a protease will bind specifically to a corresponding recognition site.

By “vector” or “plasmid” is meant a DNA molecule into which fragments of DNA may be inserted or cloned. A vector will contain one or more unique restriction sites, and may be capable of autonomous replication in a defined host or vehicle organism such that the cloned sequence is reproducible.

By “expression vector” or “construct” is meant any autonomous element capable of replicating in a host independently of the host's chromosome, after additional sequences of DNA have been incorporated into the autonomous element's genome.

All nucleotide sequences presented herein should be understood to read from the 5′ end to the 3′ end unless otherwise indicated. Likewise, all amino acid sequences should be understood to read from the N-terminal end to the C-terminal end unless otherwise indicated.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic representation of the hITF₁₅₋₇₃ dimer including the extraneous N-terminal residues Glu-Ala shown for one of the two molecules of the dimer.

FIG. 2 is a representation of an MFα-hITF₁₅₋₇₃ construct that includes the extraneous residues Glu-Ala, with cleavage sites for signal peptidase, KEX2, and STE13 indicated.

FIG. 3 is a map of an MFα-hITF₁₅₋₇₃ construct.

FIG. 4 is the nucleotide sequence of the yeast vector pPIC9 (SEQ ID NO: 1). The multiple cloning site is shown in bold.

FIG. 5 is the nucleotide sequence of the yeast vector pPICGIco (SEQ ID NO: 2). The modified multiple cloning site is shown in bold.

FIG. 6 is the nucleotide sequence of the yeast vector pPICGIco-hITF₁₅₋₇₃ (SEQ ID NO: 3), which encodes an MFα-hITF₁₅₋₇₃ fusion polypeptide lacking a Glu-Ala sequence between the MFα signal sequence and the hITF₁₅₋₇₃ sequence. A KEX2 recognition and processing site, LEKR (SEQ ID NO: 401), directly precedes the hITF₁₅₋₇₃ sequence. The restriction endonuclease sites XhoI (CTCGAG) and EcoRI (GAATTC) used for subcloning are underlined. The sequence encoding hITF₁₅₋₇₃ is shown in bold.

FIG. 7 is the nucleotide sequence of the yeast vector pPICGIco-hITF₁₋₇₃ (SEQ ID NO: 4), which encodes an MFα-ITF₁₋₇₃ fusion polypeptide lacking a Glu-Ala sequence between the MFα signal sequence and the ITF₁₋₇₃ sequence. A KEX2 recognition and processing site directly precedes the hITF₁₋₇₃ sequence.

The restriction endonuclease sites XhoI (CTCGAG) and EcoRI (GAATTC) used for subcloning are underlined. The sequence encoding hITF₁₋₇₃ is shown in bold.

FIG. 8 is the nucleotide sequence of the yeast vector pPICpre (SEQ ID NO: 5). The multiple cloning site is shown in bold.

FIG. 9 is the nucleotide sequence of the yeast vector pPICpre-hITF₁₅₋₇₃ (SEQ ID NO: 6), which encodes an MFαpre-ITF₁₅₋₇₃ fusion polypeptide lacking a Glu-Ala sequence between the MFα presequence and the ITF₁₅₋₇₃ sequence. A KEX2 recognition and processing site directly precedes the hITF₁₅₋₇₃ sequence. The restriction endonuclease sites XhoI (CTCGAG) and EcoRI (GAATTC) used for subcloning are underlined. The sequence encoding hITF₁₅₋₇₃ is shown in bold.

FIG. 10 is the nucleotide sequence of the yeast vector pPICGIco-hITF₁₅₋₇₃ (SEQ ID NO: 7), which encodes an MFα-hITF₁₅₋₇₃ fusion polypeptide containing a Glu-Ala sequence between the MFα signal sequence and the hITF₁₅₋₇₃ sequence. A KEX2 recognition and processing site directly precedes the hITF₁₅₋₇₃ sequence. The sequence encoding hITF₁₅₋₇₃ is shown in bold.

FIG. 11A shows vector maps of pPIC9 and pPICGIco and describes the construction of pPICGIco.

FIG. 11B shows the multiple cloning site of pPICGIco.

FIG. 12A is a vector map of pPICpre.

FIG. 12B is the nucleotide sequence of a PCR-amplified DNA region that is designed for insertion into pPICGIco linearized at BamH I and Xho I sites, resulting in pPICpre. The PCR-amplified region corresponds to SEQ ID NO: 131.

FIG. 12C shows the multiple cloning site of pPICpre.

FIG. 13A is the nucleotide sequence of the human ITF cDNA (Genbank Accession No: BC017859) (SEQ ID NO: 101). Primer sequences are underlined. The sequence encoding hITF₁₋₇₃ is shown in bold.

FIG. 13B is the nucleotide sequence of the pig ITF cDNA (Genbank Accession No: F14493) (SEQ ID NO: 102). Primer sequences are underlined. The sequence encoding pITF₁₋₈₀ is shown in bold.

FIG. 13C is the nucleotide sequence of the dog ITF cDNA (Genbank Accession No: NM_(—)001002990) (SEQ ID NO: 103). Primer sequences are underlined. The sequence encoding dITF₁₋₈₀ is shown in bold.

FIG. 13D is the nucleotide sequence of the rat ITF cDNA (Genbank Accession No: NM_(—)013042) (SEQ ID NO: 104). Primer sequences are underlined. The sequence encoding rITF₁₋₈₁ is shown in bold.

FIG. 13E is the nucleotide sequence of the mouse ITF cDNA (Genbank Accession No: NM_(—)011575) (SEQ ID NO: 105). Primer sequences are underlined. The sequence encoding mITF₁₋₈₁ is shown in bold.

FIG. 14A is the amino acid sequence of full-length human ITF (SEQ ID NO: 301). Particular residues are indicated by superscripts.

FIG. 14B is the amino acid sequence of full-length pig ITF (SEQ ID NO: 302). Particular residues are indicated by superscripts.

FIG. 14C is the amino acid sequence of full-length dog ITF (SEQ ID NO: 303). Particular residues are indicated by superscripts.

FIG. 14D is the amino acid sequence of full-length rat ITF (SEQ ID NO: 304). Particular residues are indicated by superscripts.

FIG. 14E is the amino acid sequence of full-length mouse ITF (SEQ ID NO: 305). Particular residues are indicated by superscripts.

FIG. 15A shows the subcloning into pCR2.1 of a PCR-amplified DNA region that includes the nucleotide sequence encoding full-length human ITF, resulting in pCR2.1-hITF. The PCR-amplified region corresponds to SEQ ID NO: 106.

FIG. 15B shows the subcloning into pCR2.1 of a PCR-amplified DNA region that includes the nucleotide sequence encoding full-length pig ITF, resulting in pCR2.1-pITF. The PCR-amplified region corresponds to SEQ ID NO: 107.

FIG. 15C shows the subcloning into pCR2.1 of a PCR-amplified DNA region that includes the nucleotide sequence encoding full-length dog ITF, resulting in pCR2.1-dITF. The PCR-amplified region corresponds to SEQ ID NO: 108.

FIG. 15D shows the subcloning into pCR2.1 of a PCR-amplified DNA region that includes the nucleotide sequence encoding full-length human ITF, resulting in pCR2.1-rITF. The PCR-amplified region corresponds to SEQ ID NO: 109.

FIG. 15E shows the subcloning into pCR2.1 of a PCR-amplified DNA region that includes the nucleotide sequence encoding full-length mouse ITF, resulting in pCR2.1-mITF. The PCR-amplified region corresponds to SEQ ID NO: 110.

FIG. 16A shows the subcloning into pPICGIco of a PCR-amplified DNA region that includes the nucleotide sequence encoding hITF₁₅₋₇₃, resulting in pPICGIco-hITF₁₅₋₇₃ (also referred to as pPICGIco-hITF). The PCR-amplified region corresponds to SEQ ID NO: 111.

FIG. 16B shows the subcloning into pPICGIco of a PCR-amplified DNA region that includes the nucleotide sequence encoding pITF₂₂₋₈₀, resulting in pPICGIco-pITF₂₂₋₈₀ (also referred to as pPICGIco-pITF). The PCR-amplified region corresponds to SEQ ID NO: 112.

FIG. 16C shows the subcloning into pPICGIco of a PCR-amplified DNA region that includes the nucleotide sequence encoding dITF₂₂₋₈₀, resulting in pPICGIco-dITF₂₂₋₈₀ (also referred to as pPICGIco-dITF). The PCR-amplified region corresponds to SEQ ID NO: 113.

FIG. 16D shows the subcloning into pPICGIco of a PCR-amplified DNA region that includes the nucleotide sequence encoding rITF₂₃₋₈₁, resulting in pPICGIco-rITF₂₃₋₈₁ (also referred to as pPICGIco-rITF). The PCR-amplified region corresponds to SEQ ID NO: 114.

FIG. 16E shows the subcloning into pPICGIco of a PCR-amplified DNA region that includes the nucleotide sequence encoding mITF₂₃₋₈₁, resulting in pPICGIco-mITF₂₃₋₈₁ (also referred to as pPICGIco-mITF). The PCR-amplified region corresponds to SEQ ID NO: 115.

FIG. 17A is the amino acid sequence of the MFα signal sequence-hITF₁₅₋₇₃ fusion (also referred to as MFα-hITF₁₅₋₇₃) (SEQ ID NO: 306). The MFα signal sequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and hITF₁₅₋₇₃ is shown in bold.

FIG. 17B is the amino acid sequence of the MFα signal sequence-pITF₂₂₋₈₀ fusion (also referred to as MFα-pITF₂₂₋₈₀) (SEQ ID NO: 307). The MFα signal sequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and pITF₂₂₋₈₀ is shown in bold.

FIG. 17C is the amino acid sequence of the MFα signal sequence-dITF₂₂₋₈₀ fusion (also referred to as MFα-dITF₂₂₋₈₀) (SEQ ID NO: 308). The MFα signal sequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and dITF₂₂₋₈₀ is shown in bold.

FIG. 17D is the amino acid sequence of the MFα signal sequence-rITF₂₃₋₈₁ fusion (also referred to as MFα-rITF₂₃₋₈₁) (SEQ ID NO: 309). The MFα signal sequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and rITF₂₃₋₈₁ is shown in bold.

FIG. 17E is the amino acid sequence of the MFα signal sequence-mITF₂₃₋₈₁ fusion (also referred to as MFα-mITF₂₃₋₈₁) (SEQ ID NO: 310). The MFα signal sequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and mITF₂₃₋₈₁ is shown in bold.

FIG. 18A shows the subcloning into pPICGIco of a PCR-amplified DNA region that includes the nucleotide sequence encoding hITF₁₋₇₃, resulting in pPICGIco-hITF₁₋₇₃. The PCR-amplified region corresponds to SEQ ID NO: 116.

FIG. 18B shows the subcloning into pPICGIco of a PCR-amplified DNA region that includes the nucleotide sequence encoding pITF₁₋₈₀, resulting in pPICGIco-pITF₁₋₈₀. The PCR-amplified region corresponds to SEQ ID NO: 117.

FIG. 18C shows the subcloning into pPICGIco of a PCR-amplified DNA region that includes the nucleotide sequence encoding dITF₁₋₈₀, resulting in pPICGIco-dITF₁₋₈₀. The PCR-amplified region corresponds to SEQ ID NO: 118.

FIG. 18D shows the subcloning into pPICGIco of a PCR-amplified DNA region that includes the nucleotide sequence encoding rITF₁₋₈₁, resulting in pPICGIco-rITF₁₋₈₁. The PCR-amplified region corresponds to SEQ ID NO: 119.

FIG. 18E shows the subcloning into pPICGIco of a PCR-amplified DNA region that includes the nucleotide sequence encoding mITF₁₋₈₁, resulting in pPICGIco-mITF₁₋₈₁. The PCR-amplified region corresponds to SEQ ID NO: 120.

FIG. 19A is the amino acid sequence of the MFα signal sequence-hITF₁₋₇₃ fusion (also referred to as MFα-hITF₁₋₇₃) (SEQ ID NO: 311). The MFα signal sequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and hITF₁₋₇₃ is shown in bold.

FIG. 19B is the amino acid sequence of the MFα signal sequence-pITF₁₋₈₀ fusion (also referred to as MFα-pITF₁₋₈₀) (SEQ ID NO: 312). The MFα signal sequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and pITF₁₋₈₀ is shown in bold.

FIG. 19C is the amino acid sequence of the MFα signal sequence-dITF₁₋₈₀ fusion (also referred to as MFα-dITF₁₋₈₀) (SEQ ID NO: 313). The MFα signal sequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and dITF₁₋₈₀ is shown in bold.

FIG. 19D is the amino acid sequence of the MFα signal sequence-rITF₁₋₈₁ fusion (also referred to as MFα-rITF₁₋₈₁) (SEQ ID NO: 314). The MFα signal sequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and rITF₁₋₈₁ is shown in bold.

FIG. 19E is the amino acid sequence of the MFα signal sequence-mITF₁₋₈₁ fusion (also referred to as MFα-mITF₁₋₈₁) (SEQ ID NO: 315). The MFα signal sequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and mITF₁₋₈₁ is shown in bold.

FIG. 20A shows the subcloning into pPICpre of a PCR-amplified DNA region that includes the nucleotide sequence encoding hITF₁₅₋₇₃, resulting in pPICpre-hITF₁₅₋₇₃ (also referred to as pPICpre-hITF). The PCR-amplified region corresponds to SEQ ID NO: 121.

FIG. 20B shows the subcloning into pPICpre of a PCR-amplified DNA region that includes the nucleotide sequence encoding pITF₂₂₋₈₀, resulting in pPICpre-pITF₂₂₋₈₀ (also referred to as pPICpre-pITF). The PCR-amplified region corresponds to SEQ ID NO: 122.

FIG. 20C shows the subcloning into pPICpre of a PCR-amplified DNA region that includes the nucleotide sequence encoding dITF₂₂₋₈₀, resulting in pPICpre-dITF₂₂₋₈₀ (also referred to as pPICpre-dITF). The PCR-amplified region corresponds to SEQ ID NO: 123.

FIG. 20D shows the subcloning into pPICpre of a PCR-amplified DNA region that includes the nucleotide sequence encoding rITF₂₃₋₈₁, resulting in pPICpre-rITF₂₃₋₈₁ (also referred to as pPICpre-rITF). The PCR-amplified region corresponds to SEQ ID NO: 124.

FIG. 20E shows the subcloning into pPICpre of a PCR-amplified DNA region that includes the nucleotide sequence encoding mITF₂₃₋₈₁, resulting in pPICpre-mITF₂₃₋₈₁ (also referred to as pPICpre-mITF). The PCR-amplified region corresponds to SEQ ID NO: 125.

FIG. 21A is the amino acid sequence of the MFα presequence-hITF₁₅₋₇₃ fusion (also referred to as MFαpre-hITF₁₅₋₇₃ or MFαpre-hITF) (SEQ ID NO: 316). The MFα presequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and hITF₁₅₋₇₃ is shown in bold.

FIG. 21B is the amino acid sequence of the MFα presequence-pITF₂₂₋₈₀ fusion (also referred to as MFαpre-pITF₂₂₋₈₀ or MFαpre-pITF) (SEQ ID NO: 317). The MFα presequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and pITF₂₂₋₈₀ is shown in bold.

FIG. 21C is the amino acid sequence of the MFα presequence-dITF₂₂₋₈₀ fusion (also referred to as MFαpre-dITF₂₂₋₈₀ or MFαpre-dITF) (SEQ ID NO: 318). The MFα presequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and dITF₂₂₋₈₀ is shown in bold.

FIG. 21D is the amino acid sequence of the MFα presequence-rITF₂₃₋₈₁ fusion (also referred to as MFαpre-rITF₂₃₋₈₁ or MFαpre-rITF) (SEQ ID NO: 319). The MFα presequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and rITF₂₃₋₈₁ is shown in bold.

FIG. 21E is the amino acid sequence of the MFα presequence-mITF₂₃₋₈₁ fusion (also referred to as MFαpre-mITF₂₃₋₈₁ or MFαpre-mITF) (SEQ ID NO: 320). The MFα presequence is shown in italics, the KEX2 proteolytic cleavage site is underlined, and mITF₂₃₋₈₁ is shown in bold.

FIG. 22 is an image of a silver-stained SDS-PAGE gel characterizing four pPICpre-hITF₁₅₋₇₃ transformants in the Pichia pastoris GS115 strain. The position of hITF₁₅₋₇₃ is indicated by an arrow.

FIG. 23A is an image of a Coomassie-stained SDS-PAGE gel characterizing nine pPICGIco-hITF₁₅₋₇₃ transformants in the Pichia pastoris GS115 strain.

FIG. 23B is an image of a Western blot characterizing nine pPICGIco-hITF₁₅₋₇₃ transformants in the Pichia pastoris GS115 strain. The position of hITF₁₅₋₇₃ is indicated by an arrow.

FIG. 24A is an image of a Coomassie-stained SDS-PAGE gel characterizing the pPICGIco-hITF₁₅₋₇₃ transformant clone 3-24 in the Pichia pastoris GS115 strain.

FIG. 24B is an image of a Western blot characterizing the pPICGIco-hITF₁₅₋₇₃ transformant clone 3-24 in the Pichia pastoris GS115 strain. The position of hITF₁₅₋₇₃ is indicated by an arrow.

DETAILED DESCRIPTION OF THE INVENTION

A vector designed for expression in P. pastoris is shown in FIG. 3, and the corresponding protein product is shown in FIG. 2. This construct includes an N-terminal mating factor α (MFα) signal sequence fused to residues 15-73 of human ITF. The MFα signal sequence is cleaved in vivo, and the cleaved protein is secreted from the cell into the expression medium. Signal peptidase cleaves at the junction between the MFα pre sequence and the MFα pro sequence, while KEX2 cleaves near the N-terminal end of the hITF₁ 573 polypeptide, between residues Arg and Glu (FIG. 2). The protease STE13 cleaves the remaining Glu-Ala sequence from the ITF₁₅₋₇₃ polypeptide, but it does so only with approximately 70% efficiency, resulting in approximately 30% of the expression product containing an extraneous dipeptide fusion at the N-terminal end (FIG. 1). These extraneous residues result in greater heterogeneity and potentially greater antigenicity of the expression product. In addition, because hITF has a cysteine residue near the C-terminal end, causing the peptide to form dimers by disulfide-bonding, the complexity in the resulting preparation is increased by the presence of two homodimers (hITF₁₅₋₇₃:hITF₁₅₋₇₃ and EA-hITF₁₅₋₇₃:EA-hITF₁₅₋₇₃) and a heterodimer (EA-hITF₁₅₋₇₃:hITF₁₅₋₇₃). This reduces the yield of the native homodimeric hITF₁₅₋₇₃.

In order to facilitate the preparation of expression vectors capable of producing ITF free of extraneous residues, the vector pPICGIco is created (FIGS. 5 and 11A-11B). pPICGIco, which encodes the MFα signal sequence and has the sequence of SEQ ID NO: 2, is generated by linearizing the plasmid pPIC9 (Invitrogen) (FIG. 4) with the restriction endonucleases Xho I and SnaB I in accordance with the vendor's instructions (New England Biolabs, Inc.). This reaction eliminates the DNA segment between the Xho I sites and the SnaB I sites that code for the KEX2 recognition sequence and the Glu-Ala (EA) spacer. The 5′-overhang is filled in with DNA polymerase in the presence of dNTPs to produce blunt ends. The blunt-ended vector is circularized by blunt-end ligation in the presence of DNA ligase (New England Biolabs, Inc.). The DNA preparation is then transformed into a bacterial host (E. coli HB101) and transformants selected from LB agar plates containing ampicillin. The resulting vector, pPICGIco, is isolated from these transformants and identified by restriction endonuclease analysis, in which the loss of the SnaB I site and retention of the Xho I and EcoR I sites in the multiple cloning site are tested.

It is also desirable to create fusion proteins in which the MFα presequence is fused to the protein of interest, rather than the full MFα prepropeptide sequence. Thus, a second expression vector encoding the MFα presequence, pPICpre, is created based on pPICGIco (FIGS. 8 and 12A-12C). pPICpre, which has the sequence of SEQ ID NO: 5, is generated by linearizing pPICGIco with the restriction endonucleases BamH I and Xho I in accordance with the vendor's instructions (New England Biolabs, Inc.). This reaction eliminates the DNA segment between the BamH I and Xho I sites that code for the MFα signal sequence. An insert containing the MFα presequence is prepared by PCR reaction using primers having the sequences of SEQ ID NO: 241 and SEQ ID NO: 242, with pPICGIco used as a template (FIG. 12B). The resulting fragment has the sequence of SEQ ID NO: 131. This PCR fragment is cleaved with the restriction enzymes BamH I and Xho I to generate ends compatible with subcloning into pPICGIco previously cleaved with BamH I and Xho I. The linearized pPICGIco vector and PCR insert are ligated in the presence of DNA ligase. The DNA preparation is transformed into a bacterial host (E. coli HB101) and transformants selected from LB agar plates containing ampicillin. The resulting vector, pPICpre, is isolated from these transformants.

Below, examples are described in which the vectors pPICGIco or pPICpre are utilized to prepare numerous improved ITF expression vectors of the invention. The examples are provided for the purpose of illustrating the invention and are not meant to limit the invention in any way.

EXAMPLE 1 Yeast Expression Vectors for Production of Mammalian ITF

In order to generate an expression vector encoding MFα-hITF₁₅₋₇₃, the following protocol is followed. Total RNA isolated from human intestine, which includes RNA molecules having the sequence of SEQ ID NO: 101, is used as a template in an RT-PCR reaction including Taq polymerase (New England Biolabs, Inc.) and primers having the sequence of SEQ ID NO: 201 and SEQ ID NO: 202. This reaction results in a PCR product having the sequence of SEQ ID NO: 106.

The resulting PCR product is then subcloned into the bacterial plasmid vector pCR2.1 (Invitrogen) using standard ligation reaction conditions in the presence of T4 DNA ligase (New England Biolabs, Inc.). The resulting clone, pCR2.1-hITF, may be sequenced using standard M13 primers adjacent to the cloning site.

Next, using pCR2.1-hITF as the template, a nucleotide sequence encoding hITF₁₅₋₇₃, with a KEX2 recognition sequence operably linked to the N terminus of hITF₁₅₋₇₃, may be obtained by performing a PCR reaction using Taq polymerase and primers having the sequence of SEQ ID NO: 211 and SEQ ID NO: 212. This reaction results in a PCR product having the sequence of SEQ ID NO: 111.

The resulting PCR product includes a Xho I site and KEX2 recognition sequence at its 5′ end, and an EcoR I site at its 3′ end. This PCR product is then digested with the restriction endonucleases Xho I and EcoR I. In a separate reaction, the vector pPICGIco is similarly digested with Xho I and EcoR I. The digested PCR product is subcloned into the linearized pPICGIco vector using standard ligation reaction conditions in the presence of T4 DNA ligase. The resulting clone, pPICGIco-hITF₁₅₋₇₃ (FIG. 6), is identified by restriction endonuclease mapping and DNA sequencing.

Other embodiments of the invention may be generated by following a similar protocol (see, e.g., FIGS. 7, 9, and 10). For example, vectors expressing fusion proteins containing alternative fragments of human ITF may be created. In addition, ITF from different species may be used. Desirable embodiments include, but are not limited to, vectors expression fusion proteins containing the following: hITF₁₋₇₃, hITF₂₁₋₆₂, hITF₂₁₋₇₀, hITF₂₁₋₇₂, hITF₂₁₋₇₃, hITF₂₂₋₆₂, hITF₂₂₋₇₀, hITF₂₂₋₇₂, hITF₂₂₋₇₃, hITF₂₅₋₆₂, hITF₂₅₋₇₀, hITF₂₅₋₇₂, hITF₂₅₋₇₃, pITF₁₋₈₀, pITF₂₂₋₈₀, dITF₁₋₈₀, dITF₂₂₋₈₀, rITF₁₋₈₁, rITF₂₃₋₈₁, mITF₁₋₈₁, and mITF₂₂₋₈₁.

Table 1 lists DNA sequences used in generating several desirable embodiments of the invention (see FIGS. 13A-21E). For each embodiment represented therein, the sequences shown may be substituted into the protocol described above for generating a vector expressing MFα-hITF₁₅₋₇₃. Embodiments 1-10 make use of the pPICGIco vector in the final subcloning step, while embodiments 11-15 instead make use of the pPICpre vector in the final subcloning step. TABLE 1 A C D E F G H I Embodiment B 1^(st) 5′ 1^(st) 3′ 1^(st) PCR 2^(nd) 5′ 2^(nd) 3′ 2^(nd) PCR Resulting vector # Template primer primer product primer primer product expresses: 1 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFα-hITF₁₅₋₇₃ NO: 101 NO: 201 NO: 202 NO: 106 NO: 211 NO: 212 NO: 111 2 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFα-pITF₂₂₋₈₀ NO: 102 NO: 203 NO: 204 NO: 107 NO: 213 NO: 214 NO: 112 3 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFα-dITF₂₂₋₈₀ NO: 103 NO: 205 NO: 206 NO: 108 NO: 215 NO: 216 NO: 113 4 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFα-rITF₂₃₋₈₁ NO: 104 NO: 207 NO: 208 NO: 109 NO: 217 NO: 218 NO: 114 5 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFα-mITF₂₃₋₈₁ NO: 105 NO: 209 NO: 210 NO: 110 NO: 219 NO: 220 NO: 115 6 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFα-hITF₁₋₇₃ NO: 101 NO: 201 NO: 202 NO: 106 NO: 221 NO: 222 NO: 116 7 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFα-pITF₁₋₈₀ NO: 102 NO: 203 NO: 204 NO: 107 NO: 223 NO: 224 NO: 117 8 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFα-dITF₁₋₈₀ NO: 103 NO: 205 NO: 206 NO: 108 NO: 225 NO: 226 NO: 118 9 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFα-rITF₁₋₈₁ NO: 104 NO: 207 NO: 208 NO: 109 NO: 227 NO: 228 NO: 119 10 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFα-mITF₁₋₈₁ NO: 105 NO: 209 NO: 210 NO: 110 NO: 229 NO: 230 NO: 120 11 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFαpre-hITF₁₅₋₇₃ NO: 101 NO: 201 NO: 202 NO: 106 NO: 231 NO: 232 NO: 121 12 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFαpre-pITF₂₂₋₈₀ NO: 102 NO: 203 NO: 204 NO: 107 NO: 233 NO: 234 NO: 122 13 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFαpre-dITF₂₂₋₈₀ NO: 103 NO: 205 NO: 206 NO: 108 NO: 235 NO: 236 NO: 123 14 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFαpre-rITF₂₃₋₈₁ NO: 104 NO: 207 NO: 208 NO: 109 NO: 237 NO: 238 NO: 124 15 SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID SEQ ID MFαpre-mITF₂₃₋₈₁ NO: 105 NO: 209 NO: 210 NO: 110 NO: 239 NO: 240 NO: 125

Table 2 lists primer sequences used in the protocols described above. TABLE 2 Primer ID Sequence (5′ to 3′) SEQ ID NO: 201 CTGCCAGAGCGCTCTGCATG SEQ ID NO: 202 CTGGAGGTGCCTCAGAAGGT SEQ ID NO: 203 ATGGAGGCCAGGATGTTCTG SEQ ID NO: 204 TCAGAAGGTGCATTCTGTTT SEQ ID NO: 205 AACGATCTCTGAGCGGTCGG SEQ ID NO: 206 TGCTTCAAAATCTGCATTCT SEQ ID NO: 207 TGCTGCCATGGAGACCAGAG SEQ ID NO: 208 CTGGAGCCTGGACAGCTTCA SEQ ID NO: 209 AGCTTGCCTGCTGCCATGGA SEQ ID NO: 210 TCCTGGAGCCTGGACAGCTT SEQ ID NO: 211 CTCGAGAAAAGAGAGGAGTACGTGGGCCTGT SEQ ID NO: 212 GAATTCTCAGAAGGTGCATTCTGCT SEQ ID NO: 213 CTCGAGAAAAGAGCCGGGGAGTATGTGGGC SEQ ID NO: 214 GAATTCTCAGAAGGTGCATTCTGTTT SEQ ID NO: 215 CTCGAGAAAAGAGTGGCTTACCAGGGCCTG SEQ ID NO: 216 GAATTCTCAAAATCTGCATTCTGT SEQ ID NO: 217 CTCGAGAAAAGACAGGAATTTGTTGGCCTA SEQ ID NO: 218 GAATTCTCAAAATGTACATTCTGT SEQ ID NO: 219 CTCGAGAAAAGAGCAGATTACGTTGGCCTG SEQ ID NO: 220 GAATTCTCAAAATGTGCATTCTGT SEQ ID NO: 221 CTCGAGAAAAGAATGCTGGGGCTGGTCCTGGC SEQ ID NO: 222 GAATTCTCAGAAGGTGCATTCTGCT SEQ ID NO: 223 CTCGAGAAAAGAATGGAGGCCAGGATGTTC SEQ ID NO: 224 GAATTCTCAGAAGGTGCATTCTGTTT SEQ ID NO: 225 CTCGAGAAAAGAATGGAGGCCAGAGTGCT SEQ ID NO: 226 GAATTCTCAAAATCTGCATTCTGT SEQ ID NO: 227 CTCGAGAAAAGAATGGAGACCAGAGCCTTC SEQ ID NO: 228 GAATTCTCAAAATGTACATTCTGT SEQ ID NO: 229 CTCGAGAAAAGAGCAGATTACGTTGGCCTG SEQ ID NO: 230 GAATTCTCAAAATGTGCATTCTGT SEQ ID NO: 231 CTCGAGAAAAGAGAGGAGTACGTGGGCCTGT SEQ ID NO: 232 GAATTCTCAGAAGGTGCATTCTGCT SEQ ID NO: 233 CTCGAGAAAAGAGCCGGGGAGTATGTGGGC SEQ ID NO: 234 GAATTCTCAGAAGGTGCATTCTGTTT SEQ ID NO: 235 CTCGAGAAAAGAGTGGCTTACCAGGGCCTG SEQ ID NO: 236 GAATTCTCAAAATCTGCATTCTGT SEQ ID NO: 237 CTCGAGAAAAGACAGGAATTTGTTGGCCTA SEQ ID NO: 238 GAATTCTCAAAATGTACATTCTGT SEQ ID NO: 239 CTCGAGAAAAGAGCAGATTACGTTGGCCTG SEQ ID NO: 240 GAATTCTCAAAATGTGCATTCTGT SEQ ID NO: 241 GGATCCAAACGATGAGA SEQ ID NO: 242 CTCGAGAGCAGCTAATGCGGATGC

Variants of the above-described embodiments of the invention are possible. For example, alternative protease cleavage sites may be used in place of the KEX2 site. Cleavage sites recognized by any of the following enzymes would be useful: yeast aspartic protease (Yap3), Type IV dipeptidyl aminopeptidase (DPAP), yeast glycosyl-phosphatidylinositol-linked aspartyl protease (Mkc7), pepsin, trypsin, chymotrypsin, and subtilisin. Yap3 cleaves immediately C-terminal to Arg residues (Bourbonnais et al., Biochimie, 76:226-233, 1994) and cleaves following Arg-Arg and Lys-Arg sites, though it cleaves poorly after three or more consecutive basic residues (Ledgerwood et al., FEBS Lett., 383:67-71, 1996); DPAP cleaves immediately C-terminal to Ala or Pro residues, including Leu-Pro and Val-Pro sites (Brenner et al., Proc Natl Acad Sci U.S.A., 89:922-926, 1992); Mkc7 cleaves immediately C-terminal to Lys-Arg (Komano et al., Proc Natl Acad Sci USA, 92:10752-10756, 1995); pepsin cleaves immediately C-terminal to Tyr, Phe, or Trp residues; trypsin cleaves immediately C-terminal to Arg or Lys residues; and chymotrypsin cleaves immediately C-terminal to Tyr, Phe, or Trp residues.

Vectors of the invention are designed so that, as with the KEX2 site in the vectors described above, no extraneous residues are present between an alternative protease cleavage site and the sequence encoding the ITF polypeptide to be expressed. Cleavage of the resulting MFα-ITF fusion protein may occur in vivo prior to secretion; for example, this could occur with cleavage sites recognized by proteases that occur naturally in the host, such as Yap3, DPAP, or Mkc7. Alternatively, uncleaved fusion protein may be secreted by the host cell if no endogenous enzyme recognizes the cleavage site of the fusion construct; in this case, cleavage may be achieved in vitro in a reaction chamber by contacting purified secreted fusion protein with a purified endoprotease such as pepsin, trypsin, chymotrypsin, subtilisin, or other enzymes. Mature ITF is known to be resistant to the action of such proteases in solution (Kinoshita et al., Mol Cell Biol., 20:4680-4690, 2000) and so will remain intact following the reaction. Subsequently, the resulting ITF polypeptide may be purified away from protease and reaction products.

Additionally, alternative embodiments are possible in which an N-terminal fusion sequence other than an MFα signal sequence or an MFα presequence is used. Any N-terminal fusion sequence that results in secretion of the expressed ITF polypeptide is useful in the methods of the invention. N-terminal fusion sequences that do not result in secretion are also possible; in such cases, the cells in which expression occurs would be lysed prior to protein extraction and purification. Signal and leader sequences for yeast expression include the yeast K28 virus preprotoxin secretion signal sequence (Eiden-Plach et al., Appl Environ Microbiol., 70:961-966, 2004), Sacchromyces cerevisiae acid phosphatase signal sequence (Akeboshi et al., Biosci Biotechnol Biochem., 67:1149-1153, 2003), Aspergillus niger isopullulanase signal sequence (Akeboshi et al., Biosci Biotechnol Biochem., 67:1149-1153, 2003), chimeric yeast alpha factor and streptomyces mobaraensis transglataminase propeptide (Yurimoto et al., Biosci Biotechnol Biochem., 68:2058-2069, 2004), modified signal peptide for rhizopus oryzae glucoamylase (Liu et al, Biochem Biophys Res Commun., 326:817-824, 2005), Kluyveromyces lactis killer toxin signal sequence (Tokunaga et al., Yeast, 9:379-397, 1993), and Map2 secretion sequence (Giga-Hama et al., Biotech Appl Biochem., 30:235-244, 1999).

In each instance described above, as well as in other embodiments of the invention, standard methods of protein purification may be used. See, for example, the purification methods and activity assays described in Thim et al., Biochemistry, 34:4757-4764, 1995, and U.S. Ser. No. 10/698,572, each of which are hereby incorporated by reference.

Host organisms for yeast expression vectors may be chosen, e.g., from among Pichia pastoris, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Candida boidinii, and Candida glabrata (Yurimoto et al., Biosci Biotechnol Biochem., 68:2058-2069, 2004; Eiden-Plach et al., Appl Environ Microbiol., 70:961-966, 2004). Desirably, the Pichia pastoris GS115 (his4-) strain is used. Other host organisms may also be used.

Organisms produced according to the invention may be employed in industrial scale production of human ITF, yielding product in quantities and for applications hitherto unattainable.

EXAMPLE 2 Expression Of pPICpre-hITF₁₅₋₇₃

The pPICpre-hITF₁₅₋₇₃ construct was transformed into Pichia pastoris strain GS115(Mut+), and transformants were selected. Expression of hITF₁₅₋₇₃ (mol weight: ˜6.5 KDa) secreted into the media after 96-120 hours of growth in shake flasks was assessed using SDS-PAGE (FIG. 22). A construct expressing EA-hITF₁₅₋₇₃ was used as a control. Four GS115 clones were tested, and clone pre-ITF #1 showed expression of a band comparable in size to that of hITF₁₅₋₇₃. These data demonstrate that the pPICpre-hITF₁₅₋₇₃ construct directs expression of hITF₁₅₋₇₃.

EXAMPLE 3 Expression of pPICGIco-hITF₁₅₋₇₃

The pPICGIco-hITF₁₅₋₇₃ construct was transformed into Pichia pastoris strains GS115(Mut+), and transformants were selected. Expression of hITF₁₅₋₇₃ (mol weight: ˜6.5 KDa) secreted into the media after 120 hours of growth in shake flasks was assessed using SDS-PAGE (FIGS. 23A and 24A) and Western blot (FIGS. 23B and 24B). A construct expressing EA-hITF₁₅₋₇₃ was used as a control. Nine GS115 clones were tested; clones 3-19, 3-24, 3-25, and 3-26 showed expression of a band comparable in size to that of hITF₁₅₋₇₃, and Western blotting confirmed that this band contained hITF. The data demonstrate that the pPICGIco-hITF₁₅₋₇₃ construct directs expression of hITF₁₅₋₇₃.

Use

The invention provides ITF expression vectors and methods of their use for treating epithelial cell lesions. Lesions amenable to treatment using the expression products and methods of this invention include epithelial lesions of the dermis and epidermis (skin), alimentary canal including the epithelia of the oral cavity, esophagus, stomach, small and large intestines (anal sphincter, rectum, and colon, particularly the sigmoid colon and the descending colon), genitourinary tract (particularly the vaginal canal, cervix, and uterus), trachea, lungs, nasal cavity, and the eye.

Other Embodiments

All publications and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims. 

1. An expression vector comprising a nucleic acid encoding a biologically active intestinal trefoil factor (ITF) polypeptide, wherein said expression vector encodes a polypeptide comprising: i. an N-terminal fusion sequence; ii. a protease cleavage site; and iii. said ITF polypeptide, wherein said cleavage site is contiguous with said ITF polypeptide and is recognized by a protease that cleaves immediately C-terminal to said cleavage site.
 2. The expression vector of claim 1, wherein said N-terminal fusion sequence comprises an MFα signal sequence.
 3. The expression vector of claim 1, wherein said N-terminal fusion sequence comprises an MFα presequence.
 4. The expression vector of claim 1, wherein said cleavage site is recognized by the protease KEX2.
 5. The expression vector of claim 4, wherein said cleavage site comprises a nucleic acid having the sequence of SEQ ID NO:
 401. 6. The expression vector of claim 1, wherein said cleavage site is recognized by yeast aspartic protease (Yap3), Type IV dipeptidyl aminopeptidase (DPAP), yeast glycosyl-phosphatidylinositol-linked aspartyl protease (Mkc 7), pepsin, trypsin, chymotrypsin, or subtilisin.
 7. The expression vector of claim 1, wherein said ITF polypeptide is hITF₁₅₋₇₃.
 8. The expression vector of claim 7, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO:
 111. 9. The expression vector of claim 7, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO:
 3. 10. The expression vector of claim 1, wherein said ITF polypeptide is hITF₁₋₇₃.
 11. The expression vector of claim 10, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO:
 116. 12. The expression vector of claim 10, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO:
 4. 13. The expression vector of claim 1, wherein said ITF polypeptide is hITF₂₁₋₆₂, hITF₂₁₋₇₀, hITF₂₁₋₇₂, hITF₂₁₋₇₃, hITF₂₂₋₆₂, hITF₂₂₋₇₀, hITF₂₂₋₇₂, hITF₂₂₋₇₃, hITF₂₅₋₆₂, hITF₂₅₋₇₀, hITF₂₅₋₇₂, or hITF₂₅₋₇₃.
 14. The expression vector of claim 1, wherein said ITF polypeptide is pITF₂₂₋₈₀ or pITF₁₋₈₀.
 15. The expression vector of claim 14, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO: 112 or SEQ ID NO:
 117. 16. The expression vector of claim 1, wherein said ITF polypeptide is dITF₂₂₋₈₀ or dITF₁₋₈₀.
 17. The expression vector of claim 16, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO: 113 or SEQ ID NO:
 118. 18. The expression vector of claim 1, wherein said ITF polypeptide is rITF₂₃₋₈₁ or rITF₁₋₈₁.
 19. The expression vector of claim 18, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO: 114 or SEQ ID NO:
 119. 20. The expression vector of claim 1, wherein said ITF polypeptide is mITF₂₃₋₈₁ or mITF₁₋₈₁.
 21. The expression vector of claim 20, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO: 115 or SEQ ID NO:
 120. 22. The expression vector of claim 2 or claim 3, wherein said cleavage site is recognized by the protease KEX2.
 23. The expression vector of claim 22, wherein said cleavage site comprises a nucleic acid having the sequence of SEQ ID NO:
 401. 24. The expression vector of claim 2 or claim 3, wherein said cleavage site is recognized by yeast aspartic protease (Yap3), Type IV dipeptidyl aminopeptidase (DPAP), yeast glycosyl-phosphatidylinositol-linked aspartyl protease (Mkc 7), pepsin, trypsin, chymotrypsin, or subtilisin.
 25. The expression vector of claim 2 or claim 3, wherein said ITF polypeptide is hITF₁₅₋₇₃.
 26. The expression vector of claim 25, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO:
 111. 27. The expression vector of claim 25, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO:
 3. 28. The expression vector of claim 2 or claim 3, wherein said ITF polypeptide is hITF₁₋₇₃.
 29. The expression vector of claim 28, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO:
 116. 30. The expression vector of claim 28, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO:
 4. 31. The expression vector of claim 2 or claim 3, wherein said ITF polypeptide is hITF₂₁₋₆₂, hITF₂₁₋₇₀, hITF₂₁₋₇₂, hITF₂₁₋₇₃, hITF₂₂₋₆₂) hITF₂₂₋₇₀, hITF₂₂₋₇₂, hITF₂₂₋₇₃, hITF₂₅₋₆₂, hITF₂₅₋₇₀, hITF₂₅₋₇₂, or hITF₂₅₋₇₃.
 32. The expression vector of claim 2 or claim 3, wherein said ITF polypeptide is pITF₂₂₋₈₀ or pITF₁₋₈₀.
 33. The expression vector of claim 32, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO: 112 or SEQ ID NO:
 117. 34. The expression vector of claim 2 or claim 3, wherein said ITF polypeptide is dITF₂₂₋₈₀ or dITF₁₋₈₀.
 35. The expression vector of claim 34, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO: 113 or SEQ ID NO:
 118. 36. The expression vector of claim 2 or claim 3, wherein said ITF polypeptide is rITF₂₃₋₈₁ or rITF₁₋₈₁.
 37. The expression vector of claim 36, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO: 114 or SEQ ID NO:
 119. 38. The expression vector of claim 2 or claim 3, wherein said ITF polypeptide is mITF₂₃₋₈₁ or mITF₁₋₈₁.
 39. The expression vector of claim 38, wherein said vector comprises a nucleic acid having the sequence of SEQ ID NO: 115 or SEQ ID NO:
 120. 40. A cell transformed with the vector of any of claims 1-39.
 41. A composition comprising a cell transformed with the vector of any of claims 1-39 and a cell culture medium.
 42. The composition of claim 41, wherein said cell is Pichia pastoris.
 43. The composition of claim 42, wherein said cell is a (Mut+) GS115 strain.
 44. The composition of claim 42, wherein said cell is a (his4-) GS 115 strain.
 45. A method of culturing a cell of claim 40 comprising the steps of: expressing said encoded polypeptide; and recovering a polypeptide comprising intestinal trefoil factor from the culture medium.
 46. The method of claim 45, wherein said polypeptide is secreted from said cell.
 47. The method of claim 46, wherein said expressed polypeptide is proteolytically processed in vivo prior to secretion from said cell.
 48. The method of claim 46, wherein said secreted polypeptide is contacted with a purified proteolytic enzyme in a reaction chamber.
 49. A method for producing a biologically active ITF polypeptide, said method comprising: culturing yeast transformants containing recombinant plasmids encoding a polypeptide comprising a ITF polypeptide, wherein said yeast produce and secrete said ITF polypeptide unaccompanied by an extraneous EA amino acid sequence; and isolating and purifying said ITF polypeptide.
 50. The method of claim 49, wherein said ITF polypeptide is hITF₁₅₋₇₃.
 51. The method of claim 49, wherein said ITF polypeptide is hITF₁₋₇₃, hITF₂₁₋₆₂, hITF₂₁₋₇₀, hITF₂₁₋₇₂, hITF₂₁₋₇₃, hITF₂₂₋₆₂ hITF₂₂₋₇₀, hITF₂₂₋₇₂, hITF₂₂₋₇₃) hITF₂₅₋₆₂, hITF₂₅₋₇₀, hITF₂₅₋₇₂, hITF₂₅₋₇₃, pITF₁₋₈₀, pITF₂₂₋₈₀, dITF₁₋₈₀, dITF₂₂₋₈₀, rITF₁₋₈₁, rITF₂₃₋₈₁, mITF₁₋₈₁, or mITF₂₂₋₈₁.
 52. A polypeptide comprising: i. an N-terminal fusion sequence; ii. a protease cleavage site; and iii. a biologically active ITF polypeptide, wherein said cleavage site is contiguous with said ITF polypeptide and is recognized by a protease that cleaves immediately C-terminal to said cleavage site.
 53. The polypeptide of claim 52, wherein the sequence of said polypeptide comprises SEQ ID NO: 306, SEQ ID NO: 307, SEQ ID NO: 308, SEQ ID NO: 309, SEQ ID NO: 310, SEQ ID NO: 311, SEQ ID NO: 312, SEQ ID NO: 313, SEQ ID NO: 314, SEQ ID NO: 315, SEQ ID NO: 316, SEQ ID NO: 317, SEQ ID NO: 318, SEQ ID NO: 319, or SEQ ID NO:
 320. 54. An expression vector comprising a nucleic acid having the sequence of SEQ ID NO:2.
 55. An expression vector comprising a nucleic acid having the sequence of SEQ ID NO:
 5. 